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A system for reducing the number of I/O 
requests required to write data to an redundant 
array of inexpensive disks (RAID) of a computer 
system including a host central processor unit 
and a memory buffer cache. The system in- 
cludes determinations for writing new data 
stored in the cache to the disk drives, as stripes, 
using the least number of I/O requests possible. 
The system uses the best of two alternative 
techniques in which the parity for the stripe can 
be generated. A first procedure determines the 
number of I/O requests that would be required 
to generate the parity data from the entire stripe 
including the new data to be written to the disk 
drives. A second procedure determines the 
number of I/O requests that would be required 
to generate the parity data from the new data to 
be written to the disk drives and the old parity 
data of the stripe. The system then aggregates 
in memory the blocks necessary to generate the 
parity data as either an entire stripe or as 
individual blocks using the technique which 
requires the least number of I/O requests as 
determined by the first and second procedures. 
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FIELD OF THE INVENTION 

This invention relates to the control of multiple 
disk drives for use with a computer system, and more 
particularly to a system for writing data to multiple disk 5 
drives. 

BACKGROUND TO THE INVENTION 

It is a problem in the field of computer systems to 10 
provide an inexpensive, high performance, high reli- 
ability, and high capacity disk storage device. Tradi- 
tional high performance and high capacity disk devic- 
es have typically used single large expensive disks 
(SLED) having form factors in the range of 30.50 to 15 
35.6 cms (12 or 14 inches.) 

The rapid acceptance of personal computers has 
created a market for inexpensive small form factor 
drives, such as 13.34cm, 8.9cm (5 1/4, 3 1/2 inch), or 
smaller. Consequently, a disk storage device com- 20 
prising a redundant array of inexpensive disks (RAID) 
has become a viable alternative for storing large 
amounts of data. Raid products substitute many small 
disk drives for a few very large expensive drives to 
provide higher storage capacities and throughput. 25 

Striping is one well known technique used with 
RAID to improve I/O throughput. Striping involves the 
concurrent transfer of data to an array of disk drives 
in "stripes." For example, if the RAID has five disk 
drives, a stripe would consist of five blocks of data, 30 
and one block is transferred from each of the disk 
drives. In a five disk RAID, a data can typically be 
processed in about 1/5 the amount of time by trans- 
ferring one block of data to each of the disk drives 
concurrently. 35 

The drawback to replacing a single large disk 
drive with several small disks is reliability, since there 
isa much higher probability that one of the disk drives 
in the array will fail, making the array inoperable. 
However, by means of data redundancy techniques, 40 
the reliability of RAID products can be substantially 
improved. Raid products typically use parity encoding 
to survive and recover from disk drive failures. Differ- 
ent levels of RAID organizations using parity encod- 
ing are currently known, see "A case for redundant ar- 45 
rays of inexpensive disks" David A. Patterson et ah, 
Report No. UCB/CSD 87/891, Dec. 1987, Computer 
Science Division (EECS), Berkeley, CA 94720. In 
RAID levels 4 and 5, one block of a stripe is reserved 
for parity data. RAID level 4 stores ail parity blocks on 50 
the same drive, RAID level 5 distributes the parity 
blocks over all of the drives in the array. 

Parity data are generally generated by using an 
exclusive or (XOR) function. RAID parity protection 
suffers from inherent problem that the number of I/O 55 
requests that must be serviced to maintain the parity 
data are many more than would be the case with non- 
RAID disks not using parity protection. For example, 



to write a block of new data to disk, the following steps 
must be performed: a) read the block storing the old 
data from the disk; b) read the block storing the old 
parity data from the disk) generate new parity data 
from the old data, the old parity data, and the new 
data; d) write the block storing the new data; and e) 
write the block storing the new parity data. In other 
words, the writing of a block of data in traditional RAID 
products typically requires four times the number of 
I/O requests than would be the case with non-RAID 
disks. 

Therefore, it is desirable to provide a system 
which reduces the number of I/O requests required to 
maintain parity data forthe stripes of a RAID product. 

SUMMARY OF THE INVENTION 

The invention in its broad form resides in an appara- 
tus for generating parity data for a plurality of disk 
drives, as recited in claim 1. The invention also re- 
sides in amethod for generating parity data for a plur- 
ality of data for a plurality of disk drives, as recited in 
claim 6. 

Described hereinafter is a system for reducing 
the number of I/O requests required to write data to a 
redundant array of inexpensive disks (RAID) of a 
computer system including a host central processor 
unit and a memory buffer cache. The system includes 
determinations for writing new data stored in the 
cache to the disk drives, as stripes, using the least 
number of I/O requests possible. The system uses the 
best of two alternative techniques in which the parity 
forthe stripe can be generated. A first procedure de- 
termines the number of I/O requests that would be re- 
quired to generate the parity data from the entire 
stripe including the new data to be written to the disk 
drives. Asecond procedure determines the number of 
I/O requests that would be required to generate the 
parity data from the new data to be written to the disk 
drives and the old parity data of the stripe. The sys- 
tem then aggregates in memory the blocks neces- 
sary to generate the parity data as either an entire 
stripe or as individual blocks using the technique 
which requires t he least number of I/O requests as de- 
termined by the first and second procedures. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the invention 
may be had from the following description of preferred 
embodiments, given by way of example only and to be 
understood in conjunction with the accompanying 
drawing wherein: 

Figure 1 is a block diagram of a computer system 
incorporating striping of the invention; 
Figure 2 is a block diagram of a RAID configura- 
tion incorporating striping of the invention; 
Figure 3 is a block diagram of a stripe including 
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data blocks and a parity block; 

Figure 4 is a flow chart of a first procedure used 

for stripe detection; and 

Figure 5 is a flow chart of a second procedure 
used for stripe detection. s 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

Referring now to the drawings, Figure 1 shows a 10 
computer system generally indicated by reference nu- 
meral 1. The computer system 1 includes a central 
processor unit or "host" 10 having primary temporary 
data storage, such as memory 11, and secondary 
permanent data storage, such as disk device 20. The is 
host 10 and disk device 20 are connected by a com- 
munication bus 30. The computer system 1 also in- 
cludes a memory buffer cache (cache) 40, also con- 
nected to the system bus 30. 

The host 10 is generally conventional, and is of 20 
the type that supports a multiple number of concur- 
rent users executing a wide variety of computer appli- 
cations, including database applications which use 
the disk device 20 for storing data. During operation 
of the computer system, the host 10 issues I/O re- 25 
quests, such as reads and writes, to transfer data be- 
tween memory 11 and the disk device 20 via the bus 
30 and cache 40. 

The cache 40 allows the computer system 1 to 
take advantage of the principles of locality of refer- 30 
ence. Presumably, the host 10 can access data stor- 
ed in a semiconductor memory cache considerably 
faster than data stored on the disk drives 21-25. Data 
frequently used by the host 10 are retained in cache 
40 for as long as possible to decrease the number of 35 
physical I/O requests to transfer data between the 
host 10 and the disk drives 21-25. The cache 40 is or- 
ganized into a plurality of blocks 41 for storing data. 
Blocks 41 store either modified or "new" data to be 
written to the disk drives 21-25, or unmodified "old" 40 
data read from the disk drives 21-25. 

Host 10 "logical" I/O write requests store the new 
data in the cache 40, and "physical" I/O write requests 
transfer the data from the cache 40 to the disk drives 
21-25, generally some time thereafter. While the new 45 
data are stored in the cache 40, that is, before the 
new data are written to permanent storage on the disk 
drives 21-25, the new data are vulnerable to corrup- 
tion due to, for example, power or system failures. For 
this reason, the cache 40 includes relatively expen- so 
sive non-volatile memory. For some applications, for 
example, random access database applications, 
where the amount of data read (old data) is much larg- 
er than the amount of new data that is written, it may 
be advantageous to partition the cache 40 into a larg- 55 
er read cache and a smaller write cache. That portion 
of the cache 40 which is used for storing old data re- 
trieved from the disk drives 21-25 can be less expen- 



sive volatile memory, since that data, in case of fail- 
ure, can easily be restored from the disk drives 21-25. 

For the purpose of illustration only, and not to limit 
generality, this invention will be described with refer- 
ence to its use in the disk device 20 which is organ- 
ized as a RAID device as described in the Patterson 
et al. paper. However, one skilled in the art will rec- 
ognize that the principles of the invention may also be 
used in storage devices organized in different man- 
ners. 

Figure 2 shows, in schematic block diagram form, 
a disk device 20 disk device 20 organized in the RAID 
level 5 fashion as described in the Patterson et al. pa- 
per. The disk device 20 comprises a controller 29 con- 
nected to the system bus 20, and a plurality of disk 
drives 21-25. 

The storage space of the disk drives 21-25 is 
physically organized into, for example, sectors, 
tracks, and cylinders, heads. However, in order to 
simplify access by the host 10, the storage space of 
the disk drives 21-25 is also organized into a set of se- 
quentially numbered blocks 41 compatible with the 
block structure of the cache 40, generally indicated 
wit h respect to disk drive 21 , by reference numeral 4 1 . 
By using sequentially numbered blocks 41 , the details 
of the physical organization of the disk drives 21-25, 
for example, the number of sectors per track, the 
number of tracks per cylinder, and the physical distrib- 
ution of all data across the drives 21-25, do not need 
to be known by the users of the host 10. In the pre- 
ferred embodiment, a block 41 of data is equal to the 
minimal amount of data that can be conveniently 
transferred between the cache 40 and the disk drives 
21-25 with a single I/O request, for example, a sector. 
Blocks 41 can also be larger, for example, an integral 
number of sectors. 

To improve the I/O throughput of the disk device 
20, the blocks 41 are further organized into yet larger 
sections of data, known as "stripes," generally indicat- 
ed by reference numeral 61. Striping techniques are 
well known in RAID, and generally involve the concur- 
rent reading and writing of data to several disk drives. 
With striping, a host I/O request distributes the data 
to be transferred across the disk drives 21-25. In the 
preferred embodiment, the stripe 61 is equal to the 
amount of data that is transferred when one block 41 
is transferred for each of the disk drives 21-25 in par- 
allel, for example five blocks 41 . 

RAID type devices, which include a large number 
of disk drives have a fairly high probability that one 
of the disk drives in the array will fail. Therefore, par- 
ity encoding is typically used to recover data that may 
be lost because of a disk drive failure. For example, 
one of the blocks41 of the stripe 61 , the "parity block" 
stores parity data generated from the other "data 
blocks" of the stripe. The parity data stored in the par- 
ity block is generated by using, for example, an exclu- 
sive OR (XOR) function. 
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RAID level 5 parity protection suffers from the in- 
herent problem that the number of I/O requests, read 
and writes, that must be serviced to write new data, 
including parity data, are many more than would be 
the case with non-RAID disks. For example, with ref- 5 
ere nee to Figure 3 showing a stripe 61 including data 
blocks 41a-41d and a parity block 41e, new parity 
could be generated as follows. Presume that block 
41a stores new data to be written from the cache 40, 
and that the blocks 41b-41d, which contain old data, w 
are not stored in the cache 40. A first technique that 
can be used for generating the parity data fbrthe par- 
ity block 41 e is to take the XOR of blocks 41a-41d. 
Therefore, three additional I/O requests would be re- 
quired to read the three blocks 41 b-41 d from the disk 15 
drives 21-25, before the new parity data can be gen- 
erated. 

Alternatively, as a second technique, the same 
new parity data for the parity block 41 e can also be 
generated as follows: reading from the disks 21-25 20 
the old data block which is going to be replaced by the 
new data block 41a which is stored in cache 40; read- 
ing the old parity block which is going to be overwrit- 
ten by the new parity block 41e; and generating the 
new parity data from the old data, the old parity data, 25 
and the new data. In other words, this second tech- 
nique only requires two additional I/O requests. 

However, in this example, if blocks 41 b-41d were 
already stored in the cache 40, no additional I/O re- 
quests would be required if the first technique was 30 
used. Therefore, prior to writing a new data block, it 
would be an advantage to determine if the entire 
stripe 61 is stored in cache 40 to reduce the number 
of I/O requests. Even if not all of the blocks 41 of a 
stripe 61 are stored in the cache 40, partial stripe can 35 
reduce the number of I/O requests. For example, if 
the old parity block 41 e were already stored in the 
cache 40, but blocks 41 b-41d were not, then the sec- 
ond technique would only require one additional I/O 
request before parity data could be generated. 40 

The purpose of entire and partial stripe detection, 
according to the invention, is to determine how to ag- 
gregate data in cache 40 sufficient to generate parity 
data with as few as possible I/O requests, thereby 
eliminating much of the latency associated with I/O 45 
requests. If the size of cache 40 is larger than the size 
of a stripe 61, there is an opportunity to eliminate a 
significant number of I/O requests due to stripe detec- 
tion. If the amount of data stored in a stripe 61 is large, 
there is less probability that an entire stripe 61 will be 50 
present in cache 40. Even if the entire stripe 61 can- 
not be detected, detecting partial stripes 61, accord- 
ing to the invention, has significant benefits. Partial 
stripes 61 can be converted to entire stripes 61 by 
reading the missing data from the disk drives 21-25. 55 
The benefit in this case is not so large as when entire 
stripes 61 are detected, but partial stripe detection still 
provides a significant performance improvement If 



the number of blocks 41 of a stripe 61 stored in cache 

40 is small, it may require fewer I/O requests by gen- 
erating the parity data from the individual blocks of 
the stripe as illustrated by the second technique 
above. The present invention provides a system for 
detecting entire and partial stripes 61, and further- 
more, provides for parity generation with the minimal 
number of I/O requests. 

In the preferred embodiment of the invention, the 
system determines the number of I/O requests re- 
quired to generate the parity of a stripe 61 by using 
two different techniques. As a first alternative, the 
system determines how many I/O requests are re- 
quired to aggregate in cache 40 all of the data blocks 

41 of an entire stripe 61 , and to generate the new par- 
ity data for the parity block 41 from all of the data 
blocks 41 of the entire stripe 61 . 

As a second alternative, the system determines 
how many I/O requests would be required to generate 
the parity data only from the individual data blocks 41 
to be written and the old parity. 

Whichever of the two alternative determinations 
produces the least number of I/O requests will be the 
optimal way for generating the parity data of a stripe 
61. 

First with reference to Figure 4, there is shown a 
flowchart of a procedure 100 for determining how 
many additional I/O requests would be required to 
generate a parity block for the entire stripe 61. Then 
referring to Figure 5, there is shown a flow chart of a 
procedure 200 for determining how many I/O re- 
quests would be required to generate the parity data 
just from the individual blocks 41 to be written. Which- 
ever of these alternative ways of writing a stripe 61 
yields the least number of I/O requests will then be 
used to generate the parity of the stripe 61 . 

Now referring to Figure 4, in procedure 100 be- 
ginning with step 110, for each of the data blocks of 
the stripe, the computer determines if there are any 
more data blocks to be processed for the stripe, if 
there are not, this procedure is done, otherwise, for 
the next data block data of the stripe, the computer 
determines if an additional I/O request is required to 
aggregate the data blocks of the stripe by proceeding 
with step 120. 

In step 120, the computer determines if the data 
block is stored in the cache. If the answer in step 120 
is yes, that is, the data block is in the cache, no ad- 
ditional I/O request is required, proceed with step 110. 
Otherwise, if the answer in step 120 is no, then one 
additional I/O request is required to read the data 
block into cache, proceed with step 110. Thus, the to- 
tal number of additional I/O requests required for ag- 
gregating the data blocks of an entire stripe is deter- 
mined. Note that this procedure does not need to ag- 
gregate into cache 40 the old parity data. 

Figure 5 shows the procedure 200 for determin- 
ing the number of I/O requests to generate parity from 
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the individual blocks of the stripe. 

Beginning with step 210, for each of the data 
blocks of the stripe, the computer determines if there 
are any more data blocks to be processed for the 
stripe, rf there are not, this procedure is done, other- 5 
wise for the next data block of the stripe, the comput- 
er determines if any additional I/O requests are re- 
quired to aggregate the blocks necessary to generate 
parity data. 

In step 220, the computer determines, if the data 10 
block of the stripe stores new data to be written to the 
disk drives 21-25. If the answer is no, that is, this data 
block of the stripe does not store new data, then no 
additional I/O requests are required, proceed with 
step 210. 15 

In step 230, the computer determines if the old 
data block corresponding to the data block to be writ- 
ten is stored in the cache. If the answer is no, that is 
the old data block is not stored in cache, then one ad- 
ditional I/O request is required to read the old data 20 
block from the disk drives 21-25, in any case proceed 
with step 240. 

In step 240, the computer determines if the old 
parity block of the stripe is stored in the cache. If the 
answer is no, that is the old parity block is not stored 25 
in the cache, then one more I/O request is required 
to read the old parity data from the disk drives 21-25, 
in any case proceed with step 210. The number of I/O 
requests for generating parity from the individual 
blocks of a stripe, as determined by procedure 200, 30 
are summed to give the total number of I/O requests 
that would be required to aggregate the data and par- 
ity blocks of a partial stripe. 

After each of these two procedures 100 and 200 
has been performed for the stripe, parity is generated 35 
according to the technique which yields the least 
number of I/O requests. 

While there has been shown and described a pre- 
ferred embodiment, it is understood that various 
other adaptations and modifications may be made 40 
within the spirit and scope of the invention. 



Claims 

45 

1 . An apparatus for generating parity data for a plur- 
ality of disk drives, each of the plurality of disk 
drives divided into a plurality of blocks, the plur- 
ality of blocks further organized into a plurality of 
stripes, each of the plurality of stripes including so 
one block from each of the plurality of disk drives, 
one of the blocks of each stripe storing parity data 
generated from the data stored in the other 
blocks of the stripe, the apparatus comprising; 

memory means for storing data, said 55 
memory means partitioned into blocks compat- 
ible with the block structure of the plurality of disk 
drives; 



means for identifying an updated memory 
block storing data to be written to one of the plur- 
ality of disk drives; 

means for identifying a predetermined 
stripe, said predetermined stripe including a disk 
block to be overwritten by said updated memory 
block; 

first means for determining the number of 
I/O requests required to aggregate in said mem- 
ory means all of the data, except the parity data, 
of said predetermined stripe; 

second means for determining the number 
of I/O requests required to aggregate in said 
memory means said block to be overwritten and 
the parity block of said predetermined stripe; and 

means for choosing said first means OR 
said second means to perform said I/O requests 
by choosing the means which requires the fewest 
number of I/O requests to aggregate in said mem- 
ory means the blocks of said identified stripe nec- 
essary to generate the parity data of said prede- 
termined stripe. 

2. The apparatus as in claim 1 wherein said memory 
means includes a non-volatile memory for storing 
data to be written to the plurality of the disk 
drives, and a volatile memory for storing data 
read from the plurality of disk drives. 

3. The apparatus as in claim 1 wherein said first and 
second means include means for determining if 
the data of the disk blocks of said predetermined 
stripe are stored in said memory means as mem- 
ory blocks. 

4. The apparatus as in claim 1 including means for 
generating the parity data of said predetermined 
stripe from all of the data, except the parity data, 
of said predetermined stripe, and means for gen- 
erating the parity data of said predetermined 
stripe from said updated memory block, said disk 
block to be overwritten, and said parity block of 
said predetermined stripe. 

5. The apparatus as in claim 4 wherein said means 
for generating the parity data includes a means 
for performing an exclusive OR function. 

6. A method for generating parity data for a plurality 
of disk drives, each of the plurality of disk drives 
divided into a plurality of blocks, the plurality of 
blocks further organized into a plurality of stripes, 
each of the plurality of stripes including one block 
from each of the plurality of disk drives, one of the 
blocks of each stripe storing parity data generat- 
ed from the data stored in the other blocks of the 
stripe, the method comprising the steps of; 

storing data in a memory means, said 
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memory means partitioned into blocks compat- 
ible wit h the block structure of the plurality of disk 
drives; 

identifying an updated memory block stor- 
ing data to be written to one of the plurality of disk 5 
drives; 

identifying a predetermined stripe, said 
predetermined stripe including a disk block to be 
overwritten by said updated memory block; 

determining by a first means, the number 10 
of I/O requests required to aggregate in said 
memory means all of the data, except the parity 
data, of said predetermined stripe; 

determining by a second means, the num- 
ber of I/O requests required to aggregate in said is 
memory means said block to be overwritten and 
the parity block of said predetermined stripe; and 

aggregating in said memory means the 
blocks of said identified stripe necessary to gen- 
erate the parity data of said predetermined stripe 20 
with the fewest number of I/O requests as deter- 
mined by said second OR said first means. 

The method as in claim 6 wherein said memory 
means includes the step of storing data to be writ- 25 
ten to the plurality of the disk drives in a non-vol- 
atile memory, and storing data read from the plur- 
ality of disk drives in a volatile memory. 

The method as in claim 6 further including the 30 
step of determining if the data of the disk blocks 
of said predetermined stripe are stored in said 
memory means as memory blocks. 

The method as in claim 6 including the step of 35 
generating the parity data of said predetermined 
stripe from all of the data, except the parity data, 
of said predetermined stripe, OR generating the 
parity data of said predetermined stripe from said 
updated memory block, said disk block to be 40 
overwritten, and said parity block of said prede- 
termined stripe, generating the parity data ac- 
cording to the step which requires the fewest I/O 
requests. 



10. The method as in claim 6 further includes the 
step of generating the parity data by means of an 
exclusive OR function. 
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